E-commerce is a trending business sector, of electronically buying and selling products over the internet. This fastgrowing industry has also adapted features like online payments, internet marketing, door-step delivery, returns and exchanges, etc. Since it is a widespread industry that adds to human convenience and luxuries, there is a lot of scope for improvements and business growth, which brings profits to both consumers and industry.
This is a Brazilian ecommerce public dataset of orders made at Olist Store. The dataset has information of 100k orders from 2016 to 2018 made at multiple marketplaces in Brazil. The Dataset contains multiple tables providing information on actual ecommerce purchases along with various other dimensions of a particular purchase like Customer Information, Seller Information, Product Metadata as well as Customer reviews on their purchase experience. The dataset also has geolocation file that relates Brazilian zip codes to latitude/longitude coordinates.
custplot <- ordertable %>%
filter(order_status %in% c("delivered", "canceled"))%>%
ggplot(aes(x=customer_state, fill=order_status)) +
geom_bar(show.legend=FALSE)+facet_wrap(~order_status, scales = "free")+
labs(x="Customer states in brazil", y="No. of orders of different status for customers in different states" , title = "Order status plot of customers of different state")
ggplotly(custplot)
g <- ggplot() +
geom_sf(data=state, fill="#2D3E50", color="#FEBF57", size=.15, show.legend = FALSE)
print(g)
#all states
a <- g + geom_point(ordergeotable, mapping = aes(x=geolocation_lng, y=geolocation_lat, color = customer_state), position = "jitter", size=.15, alpha=1/2)+
coord_sf(xlim = c(-75,-30), ylim = c(-40,5), expand = FALSE)+
labs(y="Latitude of location.", x="Longitude of location.",title="Mapping of different states of brazil in Map")
plot(a)
freq_count <- as.data.frame(table(ordertable$customer_unique_id))
odtbl <- ordertable %>% select_at(vars(year, month, customer_unique_id)) %>%
distinct_at(vars(year,month, customer_unique_id)) %>%
arrange_at(vars(year, month)) %>%
count_(vars(year, month)) %>%
ggplot() + geom_line(mapping = aes(x = month, y = n, group = 1, color="red"),show.legend = FALSE) +facet_wrap(~year)+geom_point(mapping = aes(x = month, y = n, group = 1, color="red",show.legend=FALSE),show.legend = FALSE)+labs(x="Month", y="Count of new customers added each month", title = "New customers purchased every consecutive month")
ggplotly(odtbl)
a <- items_data %>% left_join(orders_data) %>% mutate(mny = format(strptime(order_purchase_timestamp, "%Y-%m-%d %H:%M:%S"),'%Y-%m')) %>% group_by(mny) %>% summarise(total = sum(price)) %>% arrange(mny)
brazil_holidays <- brazil_holidays_data %>% mutate(mny = strftime(Date, format = "%Y-%m")) %>% group_by(mny) %>% mutate(holidays_by_week = paste0(Holiday, collapse = ",")) %>% select(mny, holidays_by_week)
brazil_holidays <- brazil_holidays[!duplicated(brazil_holidays$mny),]
p <- plot_ly(a, x = ~mny, y = ~total, type = 'scatter', mode = 'lines')
p <- p %>%
add_trace(
type = 'bar',
x = brazil_holidays$mny,
y = 1000000,
text = brazil_holidays$holidays_by_week,
hoverinfo = 'text',
marker = list(color='yellow'),
showlegend = F,
width = 0.3
) %>% layout(xaxis = list(autotick = F, dtick = 1)) %>%
layout(title = 'Monthly sale along with yearly holidays in Brazil',
xaxis = list(title = 'Time (Year - Month)',
autotick = F, dtick = 1),
yaxis = list(title = 'Total Purchase (in $)'))
p
Clearly, the sale increases significantly when there is an event during a particular month.
order_weekday <- orders_data %>% mutate(purchase_weekday = wday(order_purchase_timestamp), purchase_hour = format(strptime(order_purchase_timestamp, "%Y-%m-%d %H:%M:%S"),'%H')) %>% group_by(purchase_weekday, purchase_hour) %>% summarise(total_transactions = n())
p <- plot_ly(data = order_weekday,
x = ~purchase_hour,
y = ~purchase_weekday,
z = ~total_transactions,
type = "heatmap",
width = 1050,
height = 500,
colors = colorRamp(c("white","yellow", "red"))) %>%
layout(title = 'Transactions over the hour by day',
xaxis = list(title = 'Hour'),
yaxis = list(title = 'Day', tickvals = c(1, 2, 3, 4, 5, 6, 7), ticktext = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")),
legend = list(title = "Total Transactions")) %>%
add_annotations(x = order_weekday$purchase_hour, y = order_weekday$purchase_weekday, text = order_weekday$total_transactions, xref = 'x', yref = 'y', showarrow = FALSE, font=list(color='black'))
p
The heatmap indiciates that the majority of the transactions occur over the weekdays during office hours. We can thus increases our marketing strategies during this period.
payment_sum <- payments_data %>% filter(payment_type != "not_defined") %>% group_by(payment_type) %>% summarise(sum = sum(payment_value))
payment_count <- payments_data %>% filter(payment_type != "not_defined") %>% group_by(payment_type) %>% summarise(count = n())
p <- plot_ly() %>%
add_pie(data = payment_count, labels = ~payment_type, values = ~count, domain = list(x = c(0, 0.4), y = c(0.4, 1))) %>%
add_pie(data = payment_sum, labels = ~payment_type, values = ~sum, domain = list(x = c(0.6, 1), y = c(0.4, 1))) %>%
layout(title = "Number of payments vs Total payment values", showlegend = F,
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
p
The market in brazil is heavily dominated by credit card payments. Credit cards are used in ~74% of the total transactions amounting to ~78% of the total market revenue for Olist.
payment_group <- payments_data %>% filter(payment_type != "not_defined") %>% group_by(order_ID, payment_type) %>% summarise(count = n())
payment_order_group <- payment_group %>% left_join(orders_data) %>% select(order_ID, payment_type, count, order_purchase_timestamp) %>% mutate(purchase_mny = format(strptime(order_purchase_timestamp, "%Y-%m-%d %H:%M:%S"),'%Y-%m'))
payment_abc <- payment_order_group %>% group_by(purchase_mny, payment_type) %>% summarise(total_count = n())%>% ungroup()
p <- plot_ly(payment_abc, x = ~purchase_mny, y = ~total_count, color = ~payment_type, type = 'scatter', mode = 'lines+markers') %>% layout(title = 'Payment type growth monthly',
xaxis = list(title = 'Time (Year - Month)'),
yaxis = list(title = 'Total number of transactions'))
p